Cross - Lingual Voice Conversion
نویسندگان
چکیده
CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conversion by discussing open research questions, presenting new methods, and performing comparisons with the state-of-the-art techniques. In the training stage, a Phonetic Hidden Markov Model based automatic segmentation and alignment method is developed for crosslingual applications which support textindependent and text-dependent modes. Vocal tract transformation function is estimated using weighted speech frame mapping in more detail. Adjusting the weights, similarity to target voice and output quality can be balanced depending on the requirements of the crosslingual voice conversion application. A context-matching algorithm is developed to reduce the one-to-many mapping problems and enable nonparallel training. Another set of improvements are proposed for prosody transformation including stylistic modeling and transformation of pitch and the speaking rate. A high quality crosslingual voice conversion database is designed for the evaluation of the proposed methods. The database consists of recordings from bilingual speakers of American English and Turkish. It is employed in objective and subjective evaluations, and in case studies for testing new ideas in crosslingual voice conversion.
منابع مشابه
A phonetic assessment of cross-language voice conversion
Cross-language voice conversion maps the speech of speaker S1 in language L1 to the voice of speaker S2 using knowledge only of how S2 speaks a different language L2. This mapping is usually performed using speech material from S1 and S2 that has been deemed “equivalent” in either acoustic or phonetic terms. This study investigates the issue of equivalence in more detail, and contrasts the perf...
متن کاملA flexible and modular crosslingual voice conversion system
A cross-lingual voice conversion system aims at modifying the timbral structure of recorded sentences from a source speaker, in order to obtain processed sentences which are perceived as the same sentences uttered by a target speaker. This work presents the cross-lingual voice conversion problem as a network of related sub-problems and discuss several techniques for solving each of these sub-pr...
متن کاملFrame alignment method for cross-lingual voice conversion
Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult ...
متن کاملVoice conversion for non-parallel datasets using dynamic kernel partial least squares regression
Voice conversion aims at converting speech from one speaker to sound as if it was spoken by another specific speaker. The most popular voice conversion approach based on Gaussian mixture modeling tends to suffer either from model overfitting or oversmoothing. To overcome the shortcomings of the traditional approach, we recently proposed to use dynamic kernel partial least squares (DKPLS) regres...
متن کاملVoice Conversion of Non-aligned Data using Unit Selection
Voice conversion (VC) technology allows to transform the voice of the source speaker so that it is perceived as the voice of a target speaker. One of the applications of VC is speech-to-speech translation where the voice has to inform, not only about what is said, but also about who is the speaker. This paper introduces the different methods submitted by UPC to the TC-STAR second evaluation cam...
متن کامل